Facial Recognition

This was a project, part of my advanced topics in machine learning class. The task is to use a convolutional neural neural network for finding faces on images.

The Setup

The Modules

Import the modules we will need

We train on the GPU if possible.

Import the data

We connect to the google drive where the zip file is located.

We unzip the file and load it directly into ram for faster loading.

The torchvision module will already do everything for us. We only need to specify the folder which containts the different image folders and it will automatically assign a class (0 for not face and 1 for face). We use a data loader where we shuffle the data and create batches of size 64.

 A peek at the data

Let's look at a few examples from both set....

The convolutional binary classifier

Initialization

This will be our base model.

Define the helper functions

The accuracy will be given as number of correct predictions divided by the total number of predictions. So it's a value between 0 and 1.

The evaluate_model function calculates the performance of the model on a dataset, and returns loss and accuracy on that set.

The train_model function works in similar fashion as the evaluation function but adds the backpropagation to improve the model.

For each epoch, the run function trains the model on the training dataset and tracks the accuracy on the training set.

Training the convolutional model

We will the train the model on the whole dataset for 10 epoch.

Let's plot the losses and accuracy on both the training and validation set.

Saving/Loading the model

The face detector

Initialization

The detect_single_scale function will take a large image and run the model on it. It calculates a matrix of probabilities where it's most likely to be a face. We use a threshold to return the location with highest probabilities.

The detect_multi_scale function takes an image as well, but it will run the detect_single_scale function with the image scaled at different scale factors and thus get a lot of different locations. The module nms will join the bounding boxses and return a list of locations.

Let's define two helper function, one to get the images and one for drawing.

Now let's run the model on our full images.

We initialize our face detector and set a threshold, basically a confidence level when we can recognize it as a face.

Single Scale

Let's look at what the model sees for a single scale.

The brighter the color, the higher the predicted probability of being a face.

Without the suppression algorithm (nms=False)

Now let the model find the faces. The green boxes shows where the faces are, and the red where the model predicted there would be faces.

With the suppression algorithm (nms=True)

We see that the model performs surpisingly well, but is sometimes a bit trigger happy. The problem is when setting the threshold higher, it won't detect all the faces. So it's a tradeoff between detecting too much faces, and too few.

Improving the base model

We will try to improve the model, with 5 different approaches.

Model1

Hypothesis: a higher learning rate, using momentum with SGD will make the model converge faster

The changes

Model2

Hypothesis: more channels will allow the model to find more patterns and thus give a higher accuracy

The changes

Model3

Hypothesis: bigger kernel size may give more context and thus lead to better results

The changes

Model4

Hypothesis: a smaller batch size lead to longer computation time but better results

The changes

Model5

Hypothesis: The more layers, the better

The changes

Which is the best model?

Despite a lot of different configurations the models seem to perform very closely on the validation set. The reason might be that it's just very hard to get any higher because of the validation dataset. On the accuracy, Model3 seems to be the best; so we will use it to run on the full scale images.

Run the best model on the full images

We see pretty great results, the threshold is very sensible though.